Development of an affective database made of interactive virtual environments

Despite the great potential of Virtual Reality (VR) to arouse emotions, there are no VR affective databases available as it happens for pictures, videos, and sounds. In this paper, we describe the validation of ten affective interactive Virtual Environments (VEs) designed to be used in Virtual Reality. These environments are related to five emotions. The testing phase included using two different experimental setups to deliver the overall experience. The setup did not include any immersive VR technology, because of the ongoing COVID-19 pandemic, but the VEs were designed to run on stereoscopic visual displays. We collected measures related to the participants’ emotional experience based on six discrete emotional categories plus neutrality and we included an assessment of the sense of presence related to the different experiences. The results showed how the scenarios can be differentiated according to the emotion aroused. Finally, the comparison between the two experimental setups demonstrated high reliability of the experience and strong adaptability of the scenarios to different contexts of use.

www.nature.com/scientificreports/ which provide to experimenters a high level of control on the presentation methods and requires a relatively low demand in terms of cognitive resources to participants, making it usable for a great variety of populations. However, despite their diffusion, static images have some limitations, primarily linked with their ecological validity and the intensity of the emotion elicited. The relationship between sound and emotions has also been widely explored 20 and affective databases are available also for this sensory modality (e.g. IADS 21 ). The study of auditory stimuli includes verbal and non-verbal sounds, as well as music. A set of descriptors suitable to evaluate all of them is presented by Weninger et al. 22 . Words' phonetic features can make them sound pleasant or harsh, influencing their affective rating and also their meaning 23 . The Oxford Vocal (OxVoc) is a database of non-verbal sounds performed by adults, children, and animals emitting sad, happy, and neutral vocal expressions 24 . Considering music, fast tempo and major mode are linked to positive and active emotions, while slow tempo and minor mode are associated with negative and passive ones 25 . To overcome unisensory limitations, in the 90s, Gross et al. 26 developed the first movies affective database, made up of sixteen films eliciting amusement, anger, contentment, disgust, sadness, surprise, neutrality, and, slightly less successfully, fear. Their work has been followed by more recent studies (e.g. [27][28][29], which also differ according to the elicited emotional states. Movie clips surely offer a more involving experience than static images, however they are generally passive. The viewer does not control or interact with the presented events; similar limitations can be overcame using VR tools. Despite the use of emotional and affective media content has led to the creation of several databases of images, sounds, and videos, today and to the best of our knowledge, there are still no databases or design guidelines considering VR experiences and environments raising emotional responses. In recent years, there also have been attempts to create affective databases through 360 °C video clips. In the work conducted by Li et al. 30 , 73 selected video clips have been rated on the Valence and Arousal dimensions using the Self-Assessment Manikin scale, finding a correlation between valence ratings and standard deviation of head yaw. The video clips were played using a Head-Mounted Display (HMD) providing a good level of immersion and allowing experimenters to track head movements. Similarly, Jun et al. 31 used HMD and immersive video clips to collect, among others, data related to presence, arousal, and simulator sickness on a large participants sample. Nevertheless, the contents considered in these works were video clips selected by the authors from the public domain and did not allow participants to interact with the content presented.
Virtual environments could represent the most effective way to arouse emotions because they exploit the benefits of being multisensory and include the possibility of interaction, which is an important element for users' involvement. The novelty and the purpose of the current work are to provide a database of validated interactive virtual environments (VEs) related to five distinct emotional categories to be used for the development of affective VR experiences. We followed a categorization based on five emotions (i.e. anger, fear, disgust, sadness, and happiness) considered to be shared universally, meaning-wise 32 . The ten newly developed affective VEs were tested on two experimental setups (Exp. 1 and Exp. 2), differentiated by the technology used to deliver the virtual experience. At this stage, none of the experimental set up was based on immersive VR technologies due to the ongoing COVID-19 pandemic. However, we can define the interactive VEs as scenarios that can be navigated by the users and respond to their movements into the virtual space. For this reason, they can easily be exported for other platforms, in particular for VR headsets. The comparison showed a consistency in the results, thus proving that the environments work independently of the rendering technology.

Methods
The study was conducted simultaneously in two experimental setups (i.e., Exp. 1 and Exp. 2) differentiated by the technology used to render the VEs. The experiments followed a within-subjects design where the five emotional entities used to categorise our affective VEs, plus neutrality and the baseline, constituted our experimental conditions. Participants. The study involved a total of 75 participants. who voluntary took part to the experiment. In For both the experimental setups, each participant provided written informed consent for study participation. Written consent and all methods were carried out in accordance with the principles of the Helsinki Declaration.
Affective VEs. We used eleven VEs for the experiment. Ten scenarios, two for each of the five basic emotions (i.e. anger, fear, disgust, sadness, happiness), are newly developed. The additional scenario, for neutrality, is described in 33 .
Happy scenarios The happiness scenarios were designed differently since one aimed at creating happiness through relaxation and the other through fun and excitement. Both the scenes have a positive valence, but arousal is variable. The first scene consisted of a sunset on a tropical beach with palm trees; the user has to walk towards and into the calm sea (Fig. 1a). The second scene featured a fantastic rainbow world with colourful bubble lights and funny animals and vegetation (Fig. 1b).
Sad scenarios The first consisted of an abandoned hospital, with people suffering and crying (Fig. 1c). The second was a post-apocalyptic scene, with industrial buildings, dead animals and vegetation (Fig. 1d).
Angry scenarios The first was a school, during a fire, with smoke preventing users from seeing the surrounding clearly (Fig. 1e). The second was a labyrinth garden with high fences (Fig. 1f). In both scenes, users have to find Measures. Empathy. Before taking part in the study, participants were asked to fill in a questionnaire that aimed to assess their empathy level. We used an Italian translated and validated version of the Balanced Emotional Empathy Scale (BEES) 34 .
Emotion ratings. After each scenario presentation, participants were asked to rate on seven continuous scales from 0 to 100 to measure the intensity with which the participants' associated emotional labels corresponding to 6 basic emotions 32 , plus neutrality, with the presented scenario. The labels associated with the scales were happiness, sadness, anger, fear, disgust, surprise, and neutrality.
The numbering of the Version 3.0 items reflects the original numbering of Version 2.0 to which items 26, 27 and 28 have been removed.
Since some of the items were referred to the assessment of tactile/haptic feedbacks (i.e. item 13,17,29) or focused on measuring specific interaction dynamics or activities which were not feasible nor requested during our exploration tasks (i.e. items 7,9,15,16,23,24,31), we decided to exclude them from data collection, resulting in an adapted version of the questionnaire constituted by 19 items. We used Cronbach's alpha to measure the www.nature.com/scientificreports/ reliability of our modified 19-item scale, which was still acceptable (alpha = 0.83; N = 336). We also computed internal consistency reliability coefficients for the original four factors with the remaining 19 items: • Procedure. The testing was performed simultaneously in two setups, differentiated by the technology used to render the virtual environments. Before the scenario presentation, participants were requested to read and sign a consent form. Then, they were instructed about the structure of the study and the composition of the questionnaires. After the explanation, they were asked to first assess their current emotional state on the seven emotional labels, which constituted our baseline. We then gave them a mouse and a keyboard. We proceeded with a brief training session on a simple virtual environment developed with Unity Software (version 2019.4.13f1), constituted by a floor and some geometric 3D obstacle. Participants controlled their movements with WASD keys and oriented their point of view with the mouse. When they felt confident with the controls, participants communicated that they were ready to start the exploration session, in which they were asked to navigate and explore seven scenarios, one for each discrete emotion plus the neutral scenario. Each scene had a fixed duration of 90 seconds. Standardised scenario selection and presentation order was counterbalanced between participants using a Latin Square design. After each scenario, participants were asked to rate the seven emotional labels and complete our adapted version of the Presence Questionnaire. In Exp. 2, Presence measures were collected one time at the end of the experiment. The assessment phase between the presentation of the scenarios lasted 5 min on average and this break, combined with the order of the counterbalanced scenarios, helped us counteract any residual emotional bias between different conditions. The entire experimental session lasted about 45 min. All data were analysed and reported anonymously.
Experimental setups. Exp. 1: participants were asked to sit on a fixed chair located in front of a wall screen (width 3; height 2.5 meters) at a distance of 2 m from the projected wall. VR environments were projected using a projector (resolution 1920 × 1080p; framerate 60 fps) connected to a PC (GPU: NVIDIA GTX 2070 Max-Q; CPU: Intel i7; RAM 32 GB). The audio was played through over-ear headphones connected to the PC., The use of HMDs was discarded due to the general laboratory safety practices during the COVID-19 Pandemic. Exp. 2: participants were seated on a fixed chair located in front of a desktop computer at a distance of 70 cm from it. VR scenarios were displayed on a monitor (screen size: 27 inches; resolution: 2560 × 1440) connected to a PC (GPU: NVIDIA QUADRO P3200; CPU: Intel(R) Xeon(R) E-2186M @ 2.90GHz; RAM: 32 GB). Audio was played through a speaker connected to the PC., In this experiment an RGB-D camera was added to capture facial expressions for future analyses on Facial Expression Recognition (FER). The use of HMDs was discarded in order to reduce occlusion issues and because of the general laboratory safety practices during the COVID-19 Pandemic.
Though head tracking technology might have improved interaction with VEs, we decided not to use it. The reason is that in Exp. 1, participants were asked to sit on a chair. Thus head position was supposed not to change significantly. About head orientation, usually in using big wall displays, rotations are removed, and only displacements are left to track. This is because the user is supposed to look at the unique available wall to visualize the information. In Exp. 2, we included an RGB-D camera, and the user was supposed not to move or rotate the head to simplify data acquisition. Data analysis. For the empathy score, we performed a mixed repeated measures ANOVA with our experimental conditions as within-subjects factor, the emotional labels as measures and the questionnaire score as a between-subjects variable.
Two normality tests (i.e. Kolmogorov-Smirnov, Shapiro-Wilk) were carried out to determine if data related to the emotional labels were normally distributed. They did not follow a normal distribution, so we decided to proceed by using a non-parametric Friedman test analysis of differences among repeated measures. Then, a post hoc analysis with Wilcoxon signed-rank test was conducted. Since multiple comparisons increase the probability to commit Type I errors, a Bonferroni correction was applied to lower the critical p-value depending on the number of tests performed. We had 7 conditions (including baseline) resulting in 21 (= 7 × 6/2) [N(N-1)/2] possible combination, therefore we adjusted the significance level to 0.002 (= 0.05/21) 36 .
For Presence rates, we performed two normality test (i.e. Kolmogorov-Smirnov and Shapiro-Wilk), and the results showed that the data related to the four factors of the Presence Questionnaire were normally distributed. We, therefore, proceeded with one-way repeated measures ANOVAs (conditions: Neutral, Happy, Sad, Scary, Angry, Disgusting) with our four factors as measures (i.e. Involvement, Sensory fidelity, Adaptation/Immersion, and Interface quality). We then performed post hoc comparisons using the Bonferroni correction (alpha = 0.05).
Regarding the comparison between the two setups, data related to the emotional labels were not normally distributed. Therefore, we performed a Mann-Withney U test for each emotional label to compare the results obtained in our experimental conditions among the two different setups.
Finally, to compare the Presence scores between the two experimental setups, we computed an overall Presence score for Exp. 1 using the data collected for each condition. We then performed a one-way ANOVA to compare the overall Presence score and the four constitutive factors (i.e., Involvement, Sensory fidelity, Adaptation/immersion, Interface quality) between the two setups.

Emotion ratings.
Participants were asked to rate on seven continuous scales from 0 to 100 the intensity with which they experienced six basic emotions 32 , plus neutrality, according to the presented scenario. The labels associated with the scales were happiness, sadness, anger, fear, disgust, surprise, and neutrality. The experiments followed a within-subjects design where the five emotional entities used to categorised our VR scenarios, plus the neutral scenario and the baseline, constituted our experimental conditions. The results showed a statistically significant difference for each scale depending on which scenario was presented (i.e., the experimental condition). More specifically:     Presence. Participants were asked to fill our adapted version of the presence questionnaire after each condition only during Exp. 1. For this reason, the following analysis will consider only data from Exp. 1, while data from Exp. 2 will be considered only for a comparison between the two setups. Presence mean rates for each condition are shown in Fig. 3.    (Fig. 4f) conditions. In addition, Surprise rates differed only in the Happy condition [U = 443; Z = − 2.265; p = 0.023] (Fig. 4b). For the other environments participants' rates were very similar and only Anger rates showed a small difference in the Happy condition [U = 465; Z = − 2.697; p = 0.005] (Fig. 4b).
Despite the few differences that emerged from the comparison, it is important to highlight that the overall trend followed by participants' rates in our scenarios was consistent between the two samples. As visible from Tables 1 and 2, the comparisons for each label follow the same trend in both the experimental setups, except for Neutrality rates in the Neutral condition, which behave differently from Exp. 1 to Exp. 2.
Presence. Results did not show significant differences between the two experimental setups

Discussion
This study tested ten new affective VEs categorised according to five emotional entities (i.e., happiness, sadness, fear, anger, disgust), plus a previously validated environment for neutrality 33 . The testing was carried out using two experimental setups, differentiated based on the technology used to reproduce the virtual environments. We then individually analysed the data collected in the two experimental setups and compared them to explore the reliability of virtual environments in arousing the same emotional response using different technologies. www.nature.com/scientificreports/ To check possible biases in our participant sample, we used a validated scale (i.e. BEES) to assess the empathy capacity. Results showed a marginal effect of these measure on participants' Sadness and Anger rates only in the Sad, Angry and Scary conditions.
Overall, the results confirm the effectiveness of our affective VEs in eliciting specific emotions, with a clear distinction between the scenarios related to Happiness, Anger, Fear, Sadness, and Disgust. The scenarios were rated significantly higher in the corresponding emotional label concerning all the other scales available. The environment used to elicit Neutrality received significantly higher rating in the corresponding label only in Exp. 1, while less defined results were obtained in Exp. 2, where we observed higher Happiness rates. Nevertheless, the Neutral environment was constituted by design elements shared with our Happy scenarios and this may explain the overlap obtained in our results.
Furthermore, the comparison between the two experimental setups confirms the high reliability and versatility of our environments in eliciting specific emotions even using different technological means to reproduce them. In fact, the results obtained in the two experiments are highly similar and present only a few exceptions, especially when considering the Neutral condition. For all the other emotional entities considered, the comparison showed the same trend shared between the two experimental setups and no statistically significant differences for the highest rated emotion in each condition, except for the Neutral scenario.
Regarding presence measures, we collected these data with two main objectives: (1) to compare if two different experimental setups with different levels of immersion provided different levels of presence; (2) to explore differences linked to the emotional component elicited in each environment. We did not find significant differences between the two setups, but results from Exp. 1 showed that our different conditions affected all the four factors constituting the Presence Questionnaire adapted for this study. The less involving scenarios were those meant to elicit Anger. Since the underlying mechanism followed in those scenarios was to frustrate users by continuously making them fail and restart, this is an expected and justified result. The most involving environments were those linked to Fear, opposed to the least involving scenario represented by the Neutral one, which also showed the lower level of Sensory Fidelity. However, this was an expected result since the constitutive items in this factor were mainly related to auditory features, which in this condition were voluntarily absent.

Conclusions
Our initial validation promotes the possible application of the developed VEs in studies on emotions that involve the use of interactive and dynamic scenarios. Moreover, our results suggest the possibility to adapt hardware technologies and setup depending on the context and experiment requirements, without the risk of compromising the scenarios' effectiveness. Therefore, these environments can be considered an initial step in developing a VR affective database capable of covering five distinct emotional entities. Future developments will require, to increase the level of immersion by using fully immersive technology such as HMD or CAVE, and to introduce a more detailed analysis of the emotional experience, for example, through the collection of physiological data and the analysis of facial expressions.